Skip to content

feat: v0.12.0 Model Capability Tiering#88

Merged
Shreyas582 merged 1 commit intomainfrom
v0.12.0-capability-tiering
Apr 3, 2026
Merged

feat: v0.12.0 Model Capability Tiering#88
Shreyas582 merged 1 commit intomainfrom
v0.12.0-capability-tiering

Conversation

@Shreyas582
Copy link
Copy Markdown
Owner

v0.12.0 — Model Capability Tiering

Implements all 7 issues in the v0.12.0 milestone (#75-#81).

Changes

inference_bridge

  • ModelCapabilityProbe struct with estimated_param_billions, execution_provider, smoke_latency_ms, vocab_size
  • probe_model_capability(config) — estimates params from file size, detects EP, estimates smoke latency, extracts vocab from tokenizer JSON
  • Non-onnx/dry-run builds return sensible defaults

core_engine

  • ModelCapabilityTier enum (Basic, Moderate, Strong) with Serialize/Deserialize
  • classify_capability(probe) with const thresholds — final tier = min(param_tier, latency_tier)
  • basic_tier_summary(findings) — structured SUMMARY/FINDINGS/RISK/ACTIONS format
  • ModelCapabilityReport struct for JSON output
  • Agent now accepts capability_tier and adapts Phase 2:
    • Basic: skips LLM, uses deterministic summary
    • Moderate: LLM with reduced evidence (top-5 observations)
    • Strong: full evidence, full synthesis (current behavior)
  • model_capability field added to RunReport

cli

  • --capability-override basic|moderate|strong flag — forces tier, skips probe, adds override: true in output
  • Probe + classify wired into run_agent_once()

schemas

  • model_capability and max_severity added to run-report schema
  • Example JSON updated

Tests

  • 18 new unit tests covering classification boundaries, tier ordering, serialization, basic tier summary format, and capability report output
  • All 141 existing + new tests pass
  • cargo fmt and cargo clippy clean

Closes #75, closes #76, closes #77, closes #78, closes #79, closes #80, closes #81

- Add ModelCapabilityProbe struct and probe_model_capability() in
  inference_bridge (#76): estimates params from file size, detects EP,
  measures smoke latency, extracts vocab size from tokenizer

- Add ModelCapabilityTier enum (Basic/Moderate/Strong) with
  classify_capability() in core_engine (#77): const thresholds,
  tier = min(param_tier, latency_tier)

- Agent adapts Phase 2 by tier (#78): Basic skips LLM entirely,
  Moderate uses reduced evidence, Strong uses full synthesis

- Basic tier deterministic summary (#79): structured SUMMARY/FINDINGS/
  RISK/ACTIONS format, byte-identical across runs

- ModelCapabilityReport in RunReport JSON output (#80): tier, params,
  EP, latency, vocab_size; absent in dry-run mode

- --capability-override CLI flag (#81): forces tier, skips probe,
  adds override:true in output

- 18 new unit tests for classification boundaries, tier ordering,
  serialization, basic tier summary, and capability report

Closes #75, closes #76, closes #77, closes #78, closes #79, closes #80, closes #81
@Shreyas582 Shreyas582 added this to the v0.12.0 milestone Apr 3, 2026
@Shreyas582 Shreyas582 merged commit ac7db6e into main Apr 3, 2026
10 checks passed
@Shreyas582 Shreyas582 deleted the v0.12.0-capability-tiering branch April 3, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment